[Cherry-Pick][BugFix] fix MTP bugs in TP and overlap(#7172) by huicongyao · Pull Request #7192 · PaddlePaddle/FastDeploy

huicongyao · 2026-04-03T10:05:42Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-04-03T10:05:48Z

Thanks for your contribution!

huicongyao · 2026-04-03T10:12:11Z

/re-run approval

huicongyao · 2026-04-04T11:48:11Z

/re-run ci_xpu

fastdeploy-bot

🤖 AI Code Review | 2026-04-09 10:55 CST

📋 Review 摘要

PR 概述：Cherry-Pick PR，修复 MTP（Multi-Token Prediction）在 TP 和 overlap 场景下的 bug

变更范围：speculative decoding 相关自定义算子、GPU model runner

影响面 Tag：[Speculative Decoding] [BugFix] [OP]

📝 PR 规范检查

PR 标题符合 Cherry-Pick 规范，但描述部分 Motivation 和 Modifications 为空，建议补充。

描述模板（建议补充）：

## Motivation

修复 MTP 在以下场景下的 bug：
1. TP（Tensor Parallelism）模式下，非 rank 0 的重复采样输出被发送到消息队列，导致资源浪费
2. Overlap scheduling 场景下，`_predict_next_launch_token_num()` 的 token 数量计算方式不正确

## Modifications

1. `speculate_save_output.cc` 和 `speculate_save_output_with_topk.cc`：
   - 移除 `save_each_rank` 条件，只保留 `rank_id > 0` 检查
   - 在 TP 模式下，所有 ranks 共享相同的采样输出，只需 rank 0 发送结果

2. `gpu_model_runner.py`：
   - 简化 `next_launch_token_num` 的计算方式
   - 从 `seq_lens.sum() + is_block_step.sum() * token_num_one_step` 改为 `next_real_bsz * token_num_one_step`

问题

级别	文件	概述
❓ 疑问	`speculate_save_output.cc:41`	`save_each_rank` 参数语义是否仍需要保留？

总体评价

代码逻辑修改基本合理，针对 TP 场景的修复是正确的。但存在一个潜在疑问：如果 speculate 场景支持 EP（Expert Parallelism）模式，移除 save_each_rank 检查会导致非 rank 0 的输出丢失。建议确认 speculate 场景下 EP 的使用情况，或在注释中明确说明此修改的适用范围。

fastdeploy-bot · 2026-04-09T10:55:36Z

custom_ops/gpu_ops/speculate_decoding/speculate_save_output.cc

-  if (!save_each_rank && rank_id > 0) {
+  // NOTE(yaohuicong): Skip non-zero TP ranks — they share identical sampling
+  // outputs, so only rank 0 needs to send results to the message queue.
+  if (rank_id > 0) {


❓ 疑问 save_each_rank 参数的语义已被移除（从 if (!save_each_rank && rank_id > 0) 改为 if (rank_id > 0)），这意味着无论 save_each_rank 的值如何，只有 rank 0 会发送结果到消息队列。

潜在影响：如果 speculate 场景支持 EP（Expert Parallelism）模式，在 EP + TP 混合模式下，不同 EP rank 的输出是不同的，移除此检查会导致非 rank 0 的输出丢失。

建议：

确认 speculate decoding 场景下是否支持 EP 模式

如果不支持 EP，建议在注释中明确说明此修改仅适用于 TP 模式

如果支持 EP，需要考虑是否应该保留 save_each_rank 检查

另外，XPU 版本的 speculate_save_output.cc（custom_ops/xpu_ops/src/ops/mtp/speculate_save_output.cc）仍然保留原始逻辑，建议考虑是否需要同步修改以保持跨硬件一致性。

fastdeploy-bot · 2026-04-09T10:55:36Z

custom_ops/gpu_ops/speculate_decoding/speculate_save_output_with_topk.cc

                              int64_t rank_id,
                              bool save_each_rank) {
-  if (!save_each_rank && rank_id > 0) {
+  // NOTE(yaohuicong): Skip non-zero TP ranks — they share identical sampling


同上，save_each_rank 参数的语义已被移除。如果此函数在 EP 模式下被调用，可能会导致数据丢失。

fastdeploy-bot · 2026-04-09T10:55:36Z

fastdeploy/worker/gpu_model_runner.py

-        next_launch_token_num = (
-            seq_lens_this_time_cpu.sum().item() + is_block_step_cpu.sum().item() * token_num_one_step
-        )
+        next_launch_token_num = next_real_bsz * token_num_one_step


🟡 建议 此简化看起来是合理的，因为在 speculate decoding 场景下，每个 sequence 在每一步处理的 token 数量应该是固定的（token_num_one_step）。

不过，建议添加注释说明原始计算方式的问题，例如：

# In MTP (Multi-Token Prediction) mode, each sequence processes a fixed number of # tokens per step (num_speculative_tokens + 1), so we can simplify the calculation # from seq_lens.sum() + is_block_step.sum() * token_num_one_step to # next_real_bsz * token_num_one_step.

huicongyao added 2 commits April 3, 2026 11:17

fix MTP bugs in TP and overlap

be9287d

fix

278ba14

huicongyao had a problem deploying to Metax_ci April 3, 2026 10:05 — with GitHub Actions Failure

huicongyao changed the title ~~[Cherry-Pick][BugFix] fix MTP bugs in TP and overlap~~ [Cherry-Pick][BugFix] fix MTP bugs in TP and overlap(#7172) Apr 3, 2026

freeliuzc approved these changes Apr 8, 2026

View reviewed changes

freeliuzc merged commit 36909bf into PaddlePaddle:release/2.6 Apr 8, 2026
93 of 106 checks passed

fastdeploy-bot reviewed Apr 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cherry-Pick][BugFix] fix MTP bugs in TP and overlap(#7172)#7192

[Cherry-Pick][BugFix] fix MTP bugs in TP and overlap(#7172)#7192
freeliuzc merged 2 commits intoPaddlePaddle:release/2.6from
huicongyao:MTP

huicongyao commented Apr 3, 2026 •

edited

Loading

Uh oh!

paddle-bot bot commented Apr 3, 2026

Uh oh!

huicongyao commented Apr 3, 2026

Uh oh!

huicongyao commented Apr 4, 2026

Uh oh!

Uh oh!

fastdeploy-bot left a comment

Uh oh!

fastdeploy-bot Apr 9, 2026

Uh oh!

fastdeploy-bot Apr 9, 2026

Uh oh!

fastdeploy-bot Apr 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

huicongyao commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Apr 3, 2026

Uh oh!

huicongyao commented Apr 3, 2026

Uh oh!

huicongyao commented Apr 4, 2026

Uh oh!

Uh oh!

fastdeploy-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

fastdeploy-bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

fastdeploy-bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

fastdeploy-bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

huicongyao commented Apr 3, 2026 •

edited

Loading